GPy

GPy is a framework for Gaussian process based applications. It is design for speed and reliability. The main three pillars of its functionality are made of

  • Ease of use
  • Reproduceability
  • Scalability

In this tutorial we will have a look at the three main pillars, so you may be able to use Gaussian processes with ease of mind and without the complications of cutting edge research code.


In [1]:
import GPy, numpy as np
from matplotlib import pyplot as plt
%matplotlib inline

Ease of use

GPy handles the parameters of the parameter based models on the basis of the parameterized framework built in itself. The framework allows to use parameters in an intelligent and intuative way.


In [2]:
X = np.random.uniform(0, 10, (200, 1))
f = np.sin(.3*X) + .3*np.cos(1.3*X)
f -= f.mean()
Y = f+np.random.normal(0, .1, f.shape)

In [3]:
plt.scatter(X, Y)


Out[3]:
<matplotlib.collections.PathCollection at 0x11291b290>

In [4]:
m = GPy.models.GPRegression(X, Y)
m


Out[4]:

Model: GP regression
Log-likelihood: -198.335478428
Number of Parameters: 3
Number of Optimization Parameters: 3
Updates: True

GP_regression. Value Constraint Prior Tied to
rbf.variance 1.0 +ve
rbf.lengthscale 1.0 +ve
Gaussian_noise.variance 1.0 +ve

Changing parameters is as easy as assigning new values to the respective parameter:


In [5]:
m.rbf.lengthscale = 1.5
m


Out[5]:

Model: GP regression
Log-likelihood: -195.694049324
Number of Parameters: 3
Number of Optimization Parameters: 3
Updates: True

GP_regression. Value Constraint Prior Tied to
rbf.variance 1.0 +ve
rbf.lengthscale 1.5 +ve
Gaussian_noise.variance 1.0 +ve

The whole model gets updated automatically, when updating a parameter, without you having to interfere at all.

Change some parameters and plot the results, using the models plot() function

What do the different parameters change in the result?


In [6]:
# Type your code here

The parameters can be optimized using gradient based optimization. The optimization routines are taken over from scipy. Running the optimization in a GPy model is a call to the models own optimize method.


In [7]:
m.optimize(messages=1)


 :0: FutureWarning:IPython widgets are experimental and may change in the future.

In [8]:
_ = m.plot()



In [9]:
# You can use different kernels to use on the data.
# Try out three different kernels and plot the result after optimizing the GP:
# See kernels using GPy.kern.<tab>

In [10]:
# Type your code here

Reproduceability

GPy has a built in save and load functionality, allowing you to pickle a model with all its parameters and data in a single file. This is usefull when transferring models to another location, or rerunning models with different intializations etc.

Try saving a model using the models pickle(<name>) function and load it again using GPy.load(<name>). The loaded model is fully functional and can be used as usual.


In [11]:
# Type your code here

We have put a lot of effort in stability of execution, so try to randomize a model using its randomize() function, which randomized the models parameters. After optimization the result whould be very close to previous model optimizations.


In [12]:
# Type your code here

Scalability

  • GPys parameterized framework can handle as many parameters as you like and is memory and speed efficient in setting parameters by having only one copy of the parameters in memory.

  • There are many scalability based Gaussian process methods implemented in GPy, have a look at


In [13]:
GPy.core.SparseGP?

In [24]:
GPy.core.SVGP?

We can easily run a sparse GP on above data by using the wrapper methods for running different GPy models:

GPy.models.<tab>

Use the GPy.models.SparseGPRegression to run the above data using the sparse GP:


In [15]:
#Type your code here

Use the GPy.core.SVGP to run the above data. This is a very newly integrated feature and has therefore no thin wrapper yet in the GPy.models module.

The likelihoods of GPy are located at GPy.likelihoods